Thread: Regular expressions [Boost]

  1. #1
    Registered User
    Join Date
    May 2006
    Posts
    903

    Regular expressions [Boost]

    Hey guys. I've been reading about regular expressions for a few hours now because I thought it could do the job but apparently I either can't find the information I want or regular expressions aren't what I'm looking for. Basically, I want to have a function that searches into an std::list<> containing std::string objects and check if they match. I don't want a function that checks for perfect matches, I want a function that finds the closest possible matches so that if you have a mistake in your search field (such as inversed letters, unmatching case...) it still finds the closest possible matches. I thought regular expressions could do the job but apparently regular expressions are not flexible enough to do so, from what I've read so far. If someone can help me find a solution, I'd be more than happy.

    Thanks to everyone.

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    Finding by ignoring case in regular expressions is usually just a matter of setting some "ignore case" flag. In perl, that would be a trailing modifier like /abc/i

    Finding inverted character pairs is altogether harder. To match say "thier" as "their", you would need to match "th(ie|ei)r". If any pair can be inverted, then that becomes unpractical in a hurry.

    Perhaps the soundex is what you want
    http://en.wikipedia.org/wiki/Soundex
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    Nov 2006
    Posts
    519
    Tolerance to fuzzy input errors cries for information processing with a neuronal network. Maybe there is already some lib out there that has already a suited one for your purpose, I don't know

  4. #4
    Registered User
    Join Date
    Dec 2006
    Posts
    17
    I would guess regular expressions are more than capable of handling your task - you'll just need to guide them a little

    You need to establish exactly what you're expecting of this function before you start planning it really. Once you know precisely how you want it to work you can work out an approach.

    Ignorning case is easy. Matching "ie" or "ei" is easy (remember you can build regular expressions on the fly, at run time). Doing whatever really is easy... you just need to decide what that "whatever" is and work it out.

  5. #5
    Registered User
    Join Date
    May 2006
    Posts
    630
    Maybe you should try to do it with boost::spirit.

  6. #6
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    I agree with Salem. Approximate matching is not the domain of regular expressions (and not of recursive descent parsers either, l2u), but that of Soundex keys, Levenshtein distances and similar algorithms.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  7. #7
    Registered User
    Join Date
    May 2006
    Posts
    903
    Thank you guys. I decided just to stick with case-insensitive search which led me to this function:
    Code:
    bool TrouveSansCasse(std::string recherche, std::string element)
    {
    	boost::regex e(recherche, boost::regex::icase);
    	return boost::regex_match(element, e, boost::match_default);
    }

  8. #8
    Registered User
    Join Date
    Jun 2006
    Posts
    13
    Maybe std::lexicographical_compare_3way in <algorithm> would help you a little bit.

  9. #9
    Registered User
    Join Date
    May 2006
    Posts
    903
    Thanks for the suggestion, I'll check it out. I think boost::regex is appropriate in this case. I found out a way to make it so that it finds a phrase in a string without caring for case-sensitivity. I also added a small function to add a backslash in front of every dot in the string so that "12A3" cannot be correct using "12.3".

    Here's the code:
    Code:
    void Cours::AnnulerChRegex(std::string& requete)
    {
    	const char* backslash = "\\";
    
    	for(int i = 0; i < requete.size(); i++)
    	{
    		if(requete[i] == '.')
    			requete.insert(i++, backslash);
    	}
    }
    
    bool Cours::TrouveSansCasse(std::string recherche, std::string element)
    {
    	AnnulerChRegex(recherche);
    
    	std::stringstream tmp;
    	tmp << ".*" << recherche << ".*";
    	boost::regex e(tmp.str(), boost::regex::icase);
    	return boost::regex_match(element, e, boost::match_default);
    }

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Regular expressions
    By JimpsEd in forum C Programming
    Replies: 5
    Last Post: 05-13-2006, 06:01 PM
  2. Help please: regular expressions in C++
    By reivaj7999 in forum C++ Programming
    Replies: 3
    Last Post: 08-24-2005, 01:11 AM
  3. Regular expressions
    By jverkoey in forum A Brief History of Cprogramming.com
    Replies: 9
    Last Post: 01-23-2005, 09:36 PM
  4. Regular Expressions
    By Korn1699 in forum C# Programming
    Replies: 4
    Last Post: 01-12-2005, 12:50 AM
  5. regular expressions help
    By axon in forum A Brief History of Cprogramming.com
    Replies: 4
    Last Post: 09-09-2004, 07:16 PM